17 research outputs found

    Spatial features of reverberant speech: estimation and application to recognition and diarization

    Get PDF
    Distant talking scenarios, such as hands-free calling or teleconference meetings, are essential for natural and comfortable human-machine interaction and they are being increasingly used in multiple contexts. The acquired speech signal in such scenarios is reverberant and affected by additive noise. This signal distortion degrades the performance of speech recognition and diarization systems creating troublesome human-machine interactions.This thesis proposes a method to non-intrusively estimate room acoustic parameters, paying special attention to a room acoustic parameter highly correlated with speech recognition degradation: clarity index. In addition, a method to provide information regarding the estimation accuracy is proposed. An analysis of the phoneme recognition performance for multiple reverberant environments is presented, from which a confusability metric for each phoneme is derived. This confusability metric is then employed to improve reverberant speech recognition performance. Additionally, room acoustic parameters can as well be used in speech recognition to provide robustness against reverberation. A method to exploit clarity index estimates in order to perform reverberant speech recognition is introduced. Finally, room acoustic parameters can also be used to diarize reverberant speech. A room acoustic parameter is proposed to be used as an additional source of information for single-channel diarization purposes in reverberant environments. In multi-channel environments, the time delay of arrival is a feature commonly used to diarize the input speech, however the computation of this feature is affected by reverberation. A method is presented to model the time delay of arrival in a robust manner so that speaker diarization is more accurately performed.Open Acces

    Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics

    Full text link
    Keyword Spotting (KWS) models on embedded devices should adapt fast to new user-defined words without forgetting previous ones. Embedded devices have limited storage and computational resources, thus, they cannot save samples or update large models. We consider the setup of embedded online continual learning (EOCL), where KWS models with frozen backbone are trained to incrementally recognize new words from a non-repeated stream of samples, seen one at a time. To this end, we propose Temporal Aware Pooling (TAP) which constructs an enriched feature space computing high-order moments of speech features extracted by a pre-trained backbone. Our method, TAP-SLDA, updates a Gaussian model for each class on the enriched feature space to effectively use audio representations. In experimental analyses, TAP-SLDA outperforms competitors on several setups, backbones, and baselines, bringing a relative average gain of 11.3% on the GSC dataset.Comment: INTERSPEECH 202

    Investigaciones e investigadores de la UAM

    Full text link
    Continuamos en este número de la revista con la sección: Investigaciones en la Universidad Autónoma de Madrid, con la que se pretende dar a conocer investigaciones relacionadas con diversas disciplinas científicas que se han desarrollado o se están llevando a cabo en la UAM, con el fin de describir de una forma simple y didáctica tales trabajos, y con ello los contenidos de diversas ramas del conocimiento, y cumplir así con la finalidad inherente a esta revista de divulgar la ciencia así como de contribuir al surgimiento de posibles ideas o iniciativas para posteriores investigaciones por parte de los jóvenes científicos, o de estudiantes universitarios de grado o posgrado que están en disposición y voluntad de llegar a serlo. Se recogen a continuación algunos relatos de investigaciones realizadas por varios profesores de la UAM, los cuales se recogieron en una publicación conmemorativa del cumplimiento de los cuarenta años por parte de esta universidad y relativos a las siguientes disciplinas: Biomedicina, Historia Contemporánea, Química y alimentación, Matemáticas y Bioquímic

    ABCB1 C3435T, G2677T/A and C1236T variants have no effect in eslicarbazepine pharmacokinetics

    Full text link
    Eslicarbazepine acetate is a third-generation anti-epileptic prodrug quickly and extensively transformed to eslicarbazepine after oral administration. Reduction in seizure frequency in patients managed with eslicarbazepine is only partial in the majority of patients and many of them suffer considerable ADRs that require a change of treatment. The P-glycoprotein, encoded by the ABCB1 gene, is expressed throughout the body and can impact the pharmacokinetics of several drugs. In terms of epilepsy treatment, this transporter was linked to drug-resistant epilepsy, as it conditions drug access into the brain due to its expression at the blood-brain barrier. Therefore, we aimed to investigate the impact of three ABCB1 common polymorphisms (i.e., C3435T, or rs1045642, G2677A or rs2032582 and C1236T or rs1128503) in the pharmacokinetics and safety of eslicarbazepine. For this purpose, 22 healthy volunteers participating in a bioequivalence clinical trial were recruited. No significant relationship was observed between sex, race and ABCB1 polymorphism and eslicarbazepine pharmacokinetic variability. In contrast, ABCB1 C1236T C/C diplotype was significantly related to the occurrence of ADRs: one volunteer with this genotype suffered dizziness, somnolence and hand paresthesia, while no other volunteer suffered any of these ADRs (p < 0.045). To the best of our knowledge, this is the first study published to date evaluating eslicarbazepine pharmacogenetics. Further studies with large sample sizes are needed to compare the results obtained here.G. Villapalos-García is co-financed by Instituto de Salud Carlos III (ISCIII) and the European Social Fund (PFIS predoctoral grant, number FI20/00090). M. Navares-Gómez is financed by the ICI20/00131 grant, Acción Estratégica en Salud 2017–2020, ISCIII

    Holmium:YAG laser ablation of upper urinary tract transitional cell carcinoma with new Olympus digital flexible ureteroscope

    No full text
    Upper urinary tract transitional (UUTT) cell carcinoma is a relatively uncommon urologic tumor. The traditional treatment approach for them is radical nephroureterectomy. However, in recent years, less-invasive treatments, including different nephron-sparing procedures, have become increasingly popular. We report a case of laser ablation of UUTT cell carcinoma using new Olympus digital flexible ureteroscope (URF-V)

    Reverberant speech recognition exploiting clarity index estimation

    No full text
    We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index (C 50). Our best performing method includes the estimated value of C 50 in the ASR feature vector and also uses C 50 to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C 50 estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.status: publishe

    Analysis of prediction intervals for non-intrusive estimation of speech clarity index

    No full text
    We present an analysis of prediction intervals for a non-intrusive method to estimate the clarity index (C50). The method employed to estimate C50 is a data driven approach that extracts multiple features from a reverberant speech signal which are then used to train a bidirectional long-short term memory model which maps the feature space into the target C50 value. The prediction intervals are derived from the standard deviation of the per-frame C50 estimates. This approach was shown to provide a coverage probability of 80%, i.e. 80% of times the ground truth lies between the estimated intervals, where the interval bounds are computed by using 5.6 times the standard deviation of the per-frame estimates. This accuracy is shown to be consistent with other noisy reverberant environments.status: publishe
    corecore